## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.0 ──
## ✓ ggplot2 3.3.2 ✓ purrr 0.3.4
## ✓ tibble 3.0.4 ✓ dplyr 1.0.2
## ✓ tidyr 1.1.2 ✓ stringr 1.4.0
## ✓ readr 1.3.1 ✓ forcats 0.5.0
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
## ── Attaching packages ────────────────────────────────────── tidymodels 0.1.1 ──
## ✓ broom 0.7.1 ✓ recipes 0.1.15
## ✓ dials 0.0.9 ✓ rsample 0.0.8
## ✓ infer 0.5.3 ✓ tune 0.1.2
## ✓ modeldata 0.1.0 ✓ workflows 0.2.1
## ✓ parsnip 0.1.4 ✓ yardstick 0.0.7
## ── Conflicts ───────────────────────────────────────── tidymodels_conflicts() ──
## x scales::discard() masks purrr::discard()
## x dplyr::filter() masks stats::filter()
## x recipes::fixed() masks stringr::fixed()
## x dplyr::lag() masks stats::lag()
## x yardstick::spec() masks readr::spec()
## x recipes::step() masks stats::step()
## Parsed with column specification:
## cols(
## .default = col_double(),
## team = col_character(),
## WSWin = col_character()
## )
## See spec(...) for full column specifications.
With linear models overlayed and residual vs fitted plots. For the time being, only considering single-variable regression.
## `geom_smooth()` using formula 'y ~ x'
## # A tibble: 2 x 5
## term estimate std.error statistic p.value
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercept) -0.0166 0.0603 -0.275 7.84e- 1
## 2 mean_avg 1.99 0.232 8.57 1.20e-16
## # A tibble: 1 x 12
## r.squared adj.r.squared sigma statistic p.value df logLik AIC BIC
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 0.125 0.124 0.0667 73.5 1.20e-16 1 665. -1324. -1311.
## # … with 3 more variables: deviance <dbl>, df.residual <int>, nobs <int>
## `geom_smooth()` using formula 'y ~ x'
## # A tibble: 2 x 5
## term estimate std.error statistic p.value
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercept) -0.0203 0.0427 -0.477 6.34e- 1
## 2 mean_slg 1.25 0.102 12.2 2.57e-30
## # A tibble: 1 x 12
## r.squared adj.r.squared sigma statistic p.value df logLik AIC BIC
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 0.225 0.224 0.0627 149. 2.57e-30 1 696. -1386. -1374.
## # … with 3 more variables: deviance <dbl>, df.residual <int>, nobs <int>
## `geom_smooth()` using formula 'y ~ x'
## # A tibble: 2 x 5
## term estimate std.error statistic p.value
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercept) 0.265 0.0226 11.8 1.99e-28
## 2 mean_iso 1.50 0.143 10.5 2.21e-23
## # A tibble: 1 x 12
## r.squared adj.r.squared sigma statistic p.value df logLik AIC BIC
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 0.176 0.174 0.0647 110. 2.21e-23 1 680. -1354. -1342.
## # … with 3 more variables: deviance <dbl>, df.residual <int>, nobs <int>
## `geom_smooth()` using formula 'y ~ x'
## # A tibble: 2 x 5
## term estimate std.error statistic p.value
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercept) 0.103 0.0859 1.20 0.230
## 2 mean_babip 1.33 0.289 4.62 0.00000488
## # A tibble: 1 x 12
## r.squared adj.r.squared sigma statistic p.value df logLik AIC BIC
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 0.0399 0.0381 0.0699 21.3 4.88e-6 1 641. -1276. -1263.
## # … with 3 more variables: deviance <dbl>, df.residual <int>, nobs <int>
## `geom_smooth()` using formula 'y ~ x'
## # A tibble: 2 x 5
## term estimate std.error statistic p.value
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercept) -0.265 0.0617 -4.29 2.10e- 5
## 2 mean_obp 2.34 0.188 12.4 4.51e-31
## # A tibble: 1 x 12
## r.squared adj.r.squared sigma statistic p.value df logLik AIC BIC
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 0.231 0.229 0.0625 154. 4.51e-31 1 698. -1390. -1377.
## # … with 3 more variables: deviance <dbl>, df.residual <int>, nobs <int>
## `geom_smooth()` using formula 'y ~ x'
## # A tibble: 2 x 5
## term estimate std.error statistic p.value
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercept) -0.232 0.0560 -4.13 4.15e- 5
## 2 mean_w_oba 2.26 0.173 13.1 6.66e-34
## # A tibble: 1 x 12
## r.squared adj.r.squared sigma statistic p.value df logLik AIC BIC
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 0.250 0.248 0.0617 171. 6.66e-34 1 704. -1403. -1390.
## # … with 3 more variables: deviance <dbl>, df.residual <int>, nobs <int>
## `geom_smooth()` using formula 'y ~ x'
## # A tibble: 2 x 5
## term estimate std.error statistic p.value
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercept) -0.0404 0.0267 -1.51 1.31e- 1
## 2 mean_w_rc 0.00561 0.000276 20.3 7.25e-68
## # A tibble: 1 x 12
## r.squared adj.r.squared sigma statistic p.value df logLik AIC BIC
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 0.446 0.445 0.0530 414. 7.25e-68 1 783. -1559. -1546.
## # … with 3 more variables: deviance <dbl>, df.residual <int>, nobs <int>
## `geom_smooth()` using formula 'y ~ x'
## # A tibble: 2 x 5
## term estimate std.error statistic p.value
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercept) 0.498 0.00309 161. 0
## 2 mean_bsr 0.0184 0.00367 5.01 0.000000749
## # A tibble: 1 x 12
## r.squared adj.r.squared sigma statistic p.value df logLik AIC BIC
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 0.0467 0.0448 0.0696 25.1 7.49e-7 1 643. -1279. -1267.
## # … with 3 more variables: deviance <dbl>, df.residual <int>, nobs <int>
## `geom_smooth()` using formula 'y ~ x'
## # A tibble: 2 x 5
## term estimate std.error statistic p.value
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercept) 0.469 0.00286 164. 0.
## 2 mean_off 0.00905 0.000467 19.4 3.03e-63
## # A tibble: 1 x 12
## r.squared adj.r.squared sigma statistic p.value df logLik AIC BIC
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 0.423 0.422 0.0542 376. 3.03e-63 1 772. -1538. -1525.
## # … with 3 more variables: deviance <dbl>, df.residual <int>, nobs <int>
## `geom_smooth()` using formula 'y ~ x'
## # A tibble: 2 x 5
## term estimate std.error statistic p.value
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercept) 0.496 0.00336 148. 0.
## 2 mean_drs 0.00748 0.00114 6.58 1.46e-10
## # A tibble: 1 x 12
## r.squared adj.r.squared sigma statistic p.value df logLik AIC BIC
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 0.0966 0.0944 0.0666 43.3 1.46e-10 1 526. -1046. -1034.
## # … with 3 more variables: deviance <dbl>, df.residual <int>, nobs <int>
## `geom_smooth()` using formula 'y ~ x'
## # A tibble: 2 x 5
## term estimate std.error statistic p.value
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercept) 0.497 0.00338 147. 0
## 2 mean_uzr 0.00615 0.00125 4.92 0.00000124
## # A tibble: 1 x 12
## r.squared adj.r.squared sigma statistic p.value df logLik AIC BIC
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 0.0536 0.0514 0.0691 24.2 1.24e-6 1 538. -1071. -1059.
## # … with 3 more variables: deviance <dbl>, df.residual <int>, nobs <int>
## `geom_smooth()` using formula 'y ~ x'
## # A tibble: 2 x 5
## term estimate std.error statistic p.value
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercept) 0.500 0.00334 150. 0
## 2 mean_uzr150 0.00418 0.000844 4.95 0.00000107
## # A tibble: 1 x 12
## r.squared adj.r.squared sigma statistic p.value df logLik AIC BIC
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 0.0543 0.0521 0.0691 24.5 1.07e-6 1 539. -1071. -1059.
## # … with 3 more variables: deviance <dbl>, df.residual <int>, nobs <int>
## `geom_smooth()` using formula 'y ~ x'
## # A tibble: 2 x 5
## term estimate std.error statistic p.value
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercept) 0.486 0.00371 131. 0.
## 2 mean_def 0.00794 0.00105 7.52 3.18e-13
## # A tibble: 1 x 12
## r.squared adj.r.squared sigma statistic p.value df logLik AIC BIC
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 0.117 0.115 0.0668 56.6 3.18e-13 1 553. -1101. -1088.
## # … with 3 more variables: deviance <dbl>, df.residual <int>, nobs <int>
## `geom_smooth()` using formula 'y ~ x'
## # A tibble: 2 x 5
## term estimate std.error statistic p.value
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercept) 0.364 0.00600 60.6 6.67e-236
## 2 mean_war 0.0778 0.00320 24.3 2.20e- 87
## # A tibble: 1 x 12
## r.squared adj.r.squared sigma statistic p.value df logLik AIC BIC
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 0.535 0.534 0.0486 591. 2.20e-87 1 828. -1649. -1636.
## # … with 3 more variables: deviance <dbl>, df.residual <int>, nobs <int>
Observation of residual vs. fitted plots shows no discernible patterns, suggesting that the linear model is an adequate fit in all cases. Observation of adjusted R squared values shows that the linear models of WAR, Off, and wRC vs Win Rate are best able to account for variance in data (values of 0.534, 0.442, and 0.445 respectively). Observation of plots does not suggest any cases of low R squared resulting from a high number of outliers.
Examining how well the models for WAR, wRC, and Off function as predictors.
## # A tibble: 128 x 3
## .pred win_rate mean_war
## <dbl> <dbl> <dbl>
## 1 0.541 0.488 2.28
## 2 0.484 0.475 1.54
## 3 0.564 0.549 2.58
## 4 0.437 0.333 0.942
## 5 0.630 0.630 3.42
## 6 0.438 0.447 0.961
## 7 0.437 0.512 0.939
## 8 0.512 0.457 1.91
## 9 0.500 0.543 1.75
## 10 0.534 0.580 2.19
## # … with 118 more rows
## # A tibble: 128 x 4
## .pred win_rate mean_war pred_diff
## <dbl> <dbl> <dbl> <dbl>
## 1 0.541 0.488 2.28 0.0531
## 2 0.484 0.475 1.54 0.00831
## 3 0.564 0.549 2.58 0.0150
## 4 0.437 0.333 0.942 0.104
## 5 0.630 0.630 3.42 0.000221
## 6 0.438 0.447 0.961 0.00880
## 7 0.437 0.512 0.939 0.0757
## 8 0.512 0.457 1.91 0.0551
## 9 0.500 0.543 1.75 0.0435
## 10 0.534 0.580 2.19 0.0460
## # … with 118 more rows
## [1] 0.03931678
## [1] 0.03465513
## [1] 0.03475607
## [1] 0.02854189